Improved methods for the imputation of missing data by nearest neighbor methods
نویسندگان
چکیده
Missing data is an important issue in almost all fields of quantitative research. A nonparametric procedure that has been shown to be useful is the nearest neighbor imputation method. We suggest a weighted nearest neighbor imputation method based on Lq-distances. The weighted method is shown to have smaller imputation error than available NN estimates. In addition we consider weighted neighbor imputation methods that use selected distances. The careful selection of distances that carry information on the missing values yields an imputation tool that outperforms competing nearest neighbor methods distinctly. Simulation studies show that the suggested weighted imputation with selection of distances provides the smallest imputation error, in particular when the number of predictors is large. In addition, the selected procedure is applied to real data from different fields.
منابع مشابه
تحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری
Introduction: The aim of this study was to impute missing data and to compare the effect of different doses of vitamin D supplementation on insulin resistance during pregnancy. Methods: A clinical trial study was done on 104 women with diabetes and gestational age less than 12 weeks between 1391 and...
متن کاملFractional Regression Nearest Neighbor Imputation
Sample surveys typically gather information on a sample of units from a finite population and assign survey weights to the sampled units. Survey frequently have missing values for some variables for some units. Fractional regression imputation creates multiple values for each missing value by adding randomly selected empirical residuals to predicted values. Fractional imputation methods assign ...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملA comparison study of nonparametric imputation methods
Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputa...
متن کاملNearest Neighbor Imputation for Categorical Data by Weighting of Attributes
Missing values are a common phenomenon in all areas of applied research. While various imputation methods are available for metrically scaled variables, methods for categorical data are scarce. An imputation method that has been shown to work well for high dimensional metrically scaled variables is the imputation by nearest neighbor methods. In this paper, we extend the weighted nearest neighbo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 90 شماره
صفحات -
تاریخ انتشار 2015